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An Intelligent Agent for Web Advertisements 

Vincent Ng 1 and Kwan-Ho Mok 
Department of Computing, Hong Kong Polytechnic University, 
Kowloon, Hong Kong, China 



Abstract 

The rapid growth of Internet users attracts advertisers to 
post their advertisements in Internet. The probabilistic se- 
lection algorithm was not satisfactory; while other advertis- 
ing agents are unable to guarantee the quality due to insuf- 
ficient and unstable user information. This paper describes 
a new advertising agent based on user information. The 
users' interests are discovered by the Order Pattern Mining 
algorithm first, then applied the Gaussian curve transfor- 
mation to represent their profiles. For the advertisements, 
we use the keywords from different categories to construct 
the advertisement profiles as Gaussian curves also. This 
allows us to select advertisements based on the intersec- 
tions of the different profiles according to the users* prefer- 
ences in an effective and efficient mechanism. A prototype of 
the Intelligent Advertising Agent has been developed with 
Java and Oracle. From our evaluations, we observed that 
about 80% of the test cases are successful in making pre- 
dictions which generated the most favorable category that 
the users are interested. 1 



1. Introduction 

With the rapid growth of Internet users, the World Wide 
Web is a good way of presenting information to the public. 
The population connected to the Internet will grow from 30 
million to more than 200 million by the year 2000 according 
to Input, a global IT market intelligence firm. More than 
30,000 business information were posted in Internet and the 
number is expected to double in 1998 [1]. However, the 
advertising mode has been kept unchange which is similar 
to that used in TV and newspapers in essence. 

On the Internet, the advertisements usually are in the 
form of clickable banners. When a user clicks on the ban- 
ner, the web browser contacts the web server and returns 
the URL address of designated advertiser. The URL ad- 
dress could either be the home pages of the advertiser or a 

'The work of the authors were supported in pan by the Hong Kong 
RGC Grant: PoIyU 87/96E. 



special site which encourages potential customers to visit 
and to get more detail information. Obviously, Internet ad- 
vertising is more attractive than traditional methods because 
of the much lower cost. Further, advertising providers can 
have operations [2] such as 

• determine the return on investments of their on-line ad- 
vertising dollars 

• how many customers have clicked on the advertise- 
ments 

• on-line, centralised and real time reporting 

• real time banner substitutions 

• sophisticated tracking and reporting such as clicked 
rate and etc. 

• comprehensive advertisement targeting capabilities on 
the Internet such as the browser type, search keywords 
and etc. 

Although an advertising provider has the potential to 
give the above sophisticated features to advertisers, most 
current systems do not consider the interests of their users 
and will only display advertisements according to their own 
ideas of users' interests. In this paper, we suggest a new 
approach for creating an intelligent agent for advertising in 
Internet. 

This paper is organised into six sections. Section 2 dis- 
cusses some of the recent approaches in Internet advertise- 
ments and information filtering. Section 3 discusses our 
design of the Intelligent Advertising Agent (IAA). Section 
4 will cover the implementation of the agent. Section 5 
presents our experiments on the IAA prototype and Section 
6 concludes our work and discusses the possible enhance- 
ments. 

2. Recent Work 

One basic paradigm mentioned in [4,5,6] for Internet ad- 
vertising decisions is an uninformed approach which selects 
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an advertisement according to a probability based on the 
number of purchased units. Its problems has been identified 
as follows 

• A user may see advertisements that are unrelated to his 
current interests. 

• A user wastes a lot of time and money to download the 
uninterested information. 

• Unexpected advertisements would irritate users in 
much the same way as a magazine article is split up 
with intervening advertisements. 

• When a user accesses a Web page at different times, 
same advertisements are shown. It does not only waste 
the downloading time but also reduce the effectiveness 
of the advertisement. 

One company, DoubleClick [7], is the first to offer adver- 
tisers the ability to dynamically target advertisements on the 
Web. They developed the DART targeting technology to of- 
fer four basic categories of targeting criteria: content target- 
ing, behavioral targeting, user targeting, technical targeting. 
For DoubleClick or other similar system, the advertisement 
selection process is no different from a typical information 
filtering application. The idea is to select the information 
that a user prefers, or according to a user profile. 

One approach in content-based filtering, a document rep- 
resentation can be derived from the document contents. Yan 
implemented a simple content-based text filtering system 
for Internet News articles in a system called 'SIFT [9]. 
Profiles for SIFT are constructed manually by specifying 
words prefer or avoid. SIFT offers two facilities to assist 
users with profile constructions. Users are initially offered 
an opportunity to apply candidate profiles against the cur- 
rent day's articles to determine whether appropriate sets of 
articles are accepted and rejected. To facilitate maintenance 
of profiles over time, words which contributed to the posi- 
tion of each article in the ranked list are highlighted when 
using a web browser to access the articles. By examining 
the context of words with meanings that were unforeseen at 
the time the profile was constructed, users can select addi- 
tional words which appear in the same context to add to the 
lists of words to be avoided. 

Stevens, adopting the implicit acquisition, developed a 
system call InfoScope, also assigned to Internet news. It 
used automatic profile learning to minimise the complexity 
of exploiting information about the context in which words 
were used [11]. InfoScope deduced exact-match rules and 
offered them for approval by the users. These suggestions 
were based on simple observable actions such as the time 
spent reading a newsgroups or whether an individual mes- 
sage was saved for future reference to determine the user's 
interests. A user profile can be built up for information fil- 
tering. 



There are limitations in InfoScope. The experimental 
system Stevens developed is able to process only informa- 
tion in the header of each article (e.g. subject, author, or 
newsgroup). Another explicit acquisition model has been 
developed by Alfred Kobsa, Andreas Will, and Josef Fink 
jointly. They use the KN-AHS with BGP-MS [12, JO] to 
demonstrate the feasibility of the user modeling for infor- 
mation filtering. The BGP-MS is an adaptive user model- 
ing shell system. It utilises the partition mechanism SB- 
PART [11] which allows different types of assumptions 
about a user to represented simultaneously, but still sep- 
arately. These assumptions include concerning a user's 
knowledge or goals, application-relevant characteristics of 
user subgroups (so-called 'stereotypes') or domain knowl- 
edge of the user modeling component. 

With the exception of InfoScope, every system we have 
described requires a user to explicitly evaluate documents. 
Explicit feedback has the advantage of simplicity, and can 
minimise the source of experimental error which inference 
of the user's true reaction. Due to the insufficient capability 
of the existing algorithms, we decide to take this opportu- 
nity to propose a new approach to tackle the advertisement 
selection. The new solution will not depend on a user's ex- 
plicit input. The agent will automatically extract the per- 
sonal information, and based on it to select advertisements 
might be interested by them. 

3. Designing the Internet Advertisement Agent 

In designing the Intelligent Advertisement Agent 

(IAA), we have adopted the content-based filtering ap- 
proach. We observe that there are several types of informa- 
tion available in Internet but the major information is still 
text. Selecting an advertisement from a list of advertise- 
ments can be considered as an information retrieval process. 
As Belkin and Croft observed, content-based text selection 
techniques have been extensively evaluated in the context 
of information retrieval [13]. In the following sections, we 
will discuss the design of IAA with respect to the four com- 
ponents in details. 

3.1. Advertisement Context Representation 

As discussed previously, we will use the content of ad- 
vertisements to match against what information has been 
read by a visitor. In retrieving text information, it is usu- 
ally in the form of keywords or in word phrases. The ad- 
vantage in word phrase is that the meaning of the word 
phrase is very precise, thus reducing ambiguity in index and 
searching. However, the disadvantage is that we have to 
be aware of the phrase construction rules employed. The 
knowledge base can adopt an intermediate way by allow- 
ing both phrases and single words. This type of knowledge 
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Figure 1. The number of keywords in each 
advertisement category. 



base is usually manually constructed as automatic phrase 
construction is still difficult [8,15], Furthermore, this can 
guarantee the specificity of keywords in representing their 
categories. In our design of IAA, we use the intermediate 
approach. 

3.1.1 Advertisement Category 

It is hard to decide the possible advertisement categories. 
If there are too few categories, they cannot clearly distinct 
users' interests; while too many categories will result with 
very few keywords in each of them. Initially, we adopt the 
advertisement categories from the classified section of a lo- 
cal newspaper, the South China Morning Post. At the same 
time, we also combine these advertisement categories with 
those found in Internet directories such as [3,19]. Figure 1 
shows the different categories used in our final set. 

3. 1 .2 Weighting Factor for Keywords 

Another piece of information for content-based text filter- 
ing is the term-frequency which counts the number of times 
a keyword appears. The term-frequency can also be used to 
represent a keyword's importance. One technique is to rep- 
resent each advertisement as a vector of TFIDF (Term Fre- 
quency Inverse Document Frequency) in the space of words 
that appeared in a set of training advertisements [14]. How- 
ever, in a text filtering system, advance knowledge of the 
number of advertisements with a particular term is clearly 
not possible. Estimates are based on sampling earlier ad- 
vertisements to produce useful inverse document frequency 
values for domains in which term usage patterns are rela- 
tively stable [8]. In our design, we adopt a simple algorithm 
to generate the weighting factor. The weighting factor for 
a keyword in each advertisement category is calculated by 
the following equation: 

p. 

WeightF actor i = -f (1) 



where F{ is number of occurrences of term in Nk advertise- 
ments of category fc, and T is the total occurrences of all 
terms in AT* advertisements in the same category. 

3.2. Profile Construction 

In an information filtering system, the system's represen- 
tation of the information needed is commonly referred to as 
a 'profile*. It would not be technically correct to call the 
profile a 'user model' because the user model consists of 
both a representation of a user and a method for interpret- 
ing that representation to make predictions. Here, we shall 
use a content-based filtering with a new model for user and 
advertisement profiles to achieve this task. 

3.2.1 Gaussian Model 

Many personalized systems uses a single scaler to rank a 
document's interest for a user such as in [16]. Usually, a 
high value indicates a high level of interest whereas a low 
value is unimportant to a user. However, this approach does 
not have the broadness of a user's interest. A better tech- 
nique, as proposed in [17], is to use two values. One value 
is to focus on the median (ji) of the user's interest in a scale 
from highly interested to totally dislikes, and a second value 
to describe the broadness (6) of the user's interest [17]. 

3.2.2 Gaussian Curve Model for Advertisement and 
User 

We can imagine that a user may have different interests on 
different advertisement categories. Therefore a user pro- 
file should include several Gaussian curves to represent a 
user's interests on different categories. Each Gaussian curve 
for a category in a user profile is called category-profile. 
That is, if there are m advertisement categories, there are m 
category-profile for the user. With a Gaussian curve, we can 
roughly represent three levels of interests: 

• Dislike the advertisement 

• Indifferent to the advertisement 

• High interest the advertisement 

It is possible to have nine combinations by mapping the 
user and advertisement profiles. Yet, when an advertisement 
belongs to a certain category, it should have a high score 
already. Therefore, the possible mapping can be reduced to 
three. 

3.2.3 Advertisement Category Profiles 

When we read advertisements, we always find the same 
keyword frequently appears for different advertisements in 
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the same category. Furthermore, some advertisements may 
fit in different categories as well. This means that one key- 
word can appear in different categories and one advertise- 
ment can belong to more than one category. Therefore, 
rather to use distinct sets of keywords for different adver- 
tisement categories, keywords can be used for several cate- 
gories except they may have different weights. 

An advertisement curve for each category represents the 
characteristics of the advertisements in that category. When 
advertisers want to show their advertisements in the Web 
pages of an ISP (Internet Service Provider), they will make 
contact with the ISP. An advertiser will specify his target 
groups and allocate his advertisements in the matching cat- 
egories. To identify an advertisement's characteristics, key- 
words in the advertisement will be extracted. The extracted 
keywords will be matched with those in the same category 
in the knowledge base. Each keyword in the knowledge 
base is associated with a weighting factor which is discus- 
sion in Section 3. 1 .2. When keywords are matched, the dis- 
tribution of their weights will represent the characteristic of 
the advertisement. After processing of all advertisements in 
the same category, we use a Gaussian curve (normal distri- 
bution curve) to represent its profile [18]. 

3.2.4 User Profiles 



p. 
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Figure 2. An access pattern. 

The concept of user profiles is similar to that of an adver- 
tisement. The difference is that the keyword extractions is 
from the visited URLs of a user. It is because the content 
of the URLs (uniform resource locators) is indirectly rep- 
resenting a user's interests. For most WWW servers, there 
are access logs to keep track of their utilizations. Each line 
in the log file indicates one access of a particular Web page. 
This is only interesting from a single server view, but not for 
individual users. In fact, we are interested in the access pat- 
terns of web pages that either their total viewing times are 



more than a threshold or the patterns are having a certain 
or more number of Web pages. Here, we adopt the OPM 
(Order Pattern Mining) algorithm in discovering the access 
patterns that will indicate users' interests [20]. 

In order to work with the OPM algorithm, we need to 
identify Web pages that are accessed by an individual user, 
or even a session done by an individual user. A normal 
WWW access log records the information of client IP ad- 
dress, name of html pages and the access time on those 
pages. With these information, our identification method 
is based on the followings assumptions 

• The server can be a WWW server or a proxy server as 
long as access logs can be recorded. 

• Only one user is active on a single IP address at a time. 

• When a user has been idle for a long time (say 1 hour), 
the current session ends. 

• Local cache in the client machines have been disable 
and every Web page access will need to be served by 
the server. 

Each session of a user is transformed into an access trans- 
action. For example, in Figure 2, User A accessed the Web 
pages in the sequence of P2 -> P4 -» A P< -> P5 
P7. The access sequence forms a list of order pairs in the 
format of (Pi,dij) where Pi is the page label (number) and 
dij the viewing time of the page. Each order pair of T; is 
represented as Wij, For the example in Figure 2, the se- 
quence of the access transaction is {(2,7), (4,5), (6,5), (4,5), 
(5,2), (7,9)}. 

From the access log, we identify the sessions, convert 
them into access transactions and then prepare for mining 
of user behaviors. After these transformation, we can apply 
the OPM algorithm to solve the following problem. 

Problem 1 For all access transactions in a given database 
D, extract all the frequent patterns of size k (APk *s) for 
different users where for each equivalent access pattern S % k 
of a APk, we have 

F is a threshold value representing the minimal viewing 
time of the pattern. 

The extracted access patterns reflect the interest of users. 
Keywords from the web pages of the access patterns of a 
user will be matched with keywords with each advertise- 
ment category. After the completion of the matching pro- 
cess with all advertisement categories, we construct the user 
category profiles. For example, a user profile of a category 
is represented by the Gaussian curve in Figure 3 with p = 
0.3 and 6 = 0.5. 
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1 . Let U p i , . . . , U pk be the user's category profiles (Gaus- 
sian curves) of user U. 

2. For all advertisement categories (C\ , . . . , Ct), find the 
one category (C;) that has the maximum overlapping 
area from d f) U p i. 




Figure 3. Calculation of overlapping area by 
portion. 



3.3. Profile Comparisons 

From last two sections, Gaussian curves are used to 
model a user's interests and an advertisement's characteris- 
tics. Therefore, we can decide a selection by measuring the 
amount of overlaps. That is, if the overlapping area is large, 
the correlation between two profiles is relatively close and 
vice versa. An example for the comparison is in Figure 3. 

Obviously, the areas under two Gaussian curves may 
have different sizes and shapes. One calculation method 
is to discretize the parameters and store the results in a table 
for quick retrievals later. The first step is to transform all the 
observations of a normal random variable X into a normal 
variable Z with mean zero and variance one. This can be 
done by the transformation 

Z = (X-fA)/6 (2) 

To calculate the overlapping area of two curves, we find 
the intersection points with the given /ii , <$i , fi 2 and <S 2 by 
using the formula below. 

For the example in Figure 4, the X values of the intersection 
points are 0.217 and 0.875. Their corresponding values of 
Z\ and Zi of the curve with square symbols are equal to 

Z 1 = (0.217 - 0.3)/0.5 = -0.166 (4) 

and 

Z 2 = (0.875 - 0.3)/0.5 = 1.15 (5) 

The corresponding values of Z 3 and Z A of the second curve 
(with diamond symbols) are equal to -1.304 and 1.728 
respectively. From the Z- value table in [18], the corre- 
sponding areas (A u A 2} A z ,Aa) for Zu £2. Z z and Z 4 are 



3. For all advertisements in Cu let A^, . . . ,^5 be the 
five advertisements with the highest scores calculated 
by 

• score = OverlapArea(C/p, j4y) x NU^ 
where NUij is the percentage of unused units 
of advertisement A\j over the sum of all unused 
units in advertisement category i. 

4. Randomly select one of the five advertisements for the 
user. Decrease the number of unused of that advertise- 
ment by one. 

Figure 4. Algorithm to select an advertise- 
ment. 



0.4325, 0.8749, 0.0968 and 0.9528, respectively. Therefore, 
the overlapping area of the two curves can be found as the 
sum of areas 'B\ 'C and 'D' in the figure, and it is equal to 

Total Area = A x + (A 4 - A*) + (1 - A 2 ) = 0.5864. (6) 

3.4. Selecting an Advertisement 

We can now derive a measure to indicate if a user will 
accept an advertisement category for his viewing. In the 
IAA, an advertisement selection algorithm is developed as 
shown in Figure 4. It considers the advertisement charac- 
teristics and the display units bought for an advertisement. 
In the algorithm, we use the product of the advertisement 
score (overlapping area) with the percentage of the unused 
display units for all advertisements in the selected category. 
From the five highest scored advertisements, we randomly 
select one for the user. In this way, it tries to prevent from 
'banner burnout 1 and maintains healthy click-through rates. 



4. IAA Development 

The system architecture for the intelligent advertising 
agent (IAA) is proposed in Figure 5. It has four main mod- 
ules: 

• Text Retrieval Module 

• Profile Modeling Module 

• Advertisement User Matching Module 
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Figure 6. Users evaluating IAA. 
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Figure 7. Advertisements to evaluate IAA. 



Figure 5. IAA Architecture. 

• Advertisement Display Module 

A prototype of the IAA is developed with Oracle and 
Java. During the implementation, we want to reduce our 
development effort while the prototype should have suffi- 
cient features to demonstrate the effectiveness and accuracy 
of our IAA design. Therefore, we have relaxed our require- 
ments with the followings. 

• The prototype is to be tested offline. That is, advertise- 
ments will not be shown during regular web browsing. 

• User web browsing histories (preferences) are col- 
lected either via a proxy server or manually. This is 
to support our user profile creations. 

• Advertisements and thier contents are acquired either 
via the popular Internet gateways or manually. 

The knowledge base is the key area in the system. The 
keywords- in it are used to generate the user and advertise- 
ment profiles. In this pre-processing steps, a set of adver- 
tisements (URLs) are collected and their profiles are gener- 
ated for the later matching process. In calculating the over- 
lappings of the profiles, we use the one-byte representation 
scheme as in [17]. A single byte is used to pack the two pa- 
rameters, \x and r5, of a profile after discretizing into a range 
from 0 to 255. Pre-calculated values are stored in the Gaus- 
sian lookup table and the Z-value table. This would allow 
us to find out the overlapping area by a simple 3-way join 
over the tables. 

5. Evaluation 

In our experiments, we would like to measure the predic- 
tive performance of the IAA prototype by the acceptance of 
the suggested advertisements for our users. In doing the ex- 
periments, we focused on three main aspects: keyword ex- 
traction, matching model for predictions and the accuracy 
of the pre-calculated tables. 



At the beginning, after we have completed our first IAA 
prototype, we invited a group of students in Hong Kong 
Polytechnic University and colleagues of the two authors. 
Unfortunately, there were only 7 responses. The poor re- 
sponse may be due to the timing, when the invitations were 
done close to the end of the term. We then tried again in next 
study term but also with other sources, secondary schools 
and other universities. The final distribution of users eval- 
uating the IAA prototype is shown in Figure 6. We believe 
that our test cases are more computer literate. Also, because 
of limited resources, we decided to use only 5 advertisement 
categories, which have some relationship to computing, in 
our experiments. Advertisements for these five categories 
have been extracted via the Internet directory home pages 
[3,19]. Their categories are set to the same categories as 
originally placed in the Internet directory. The advertise- 
ment categories and their profiles are shown in Figure 7. 

5.1. User Profiles 

There are two batches of our cases. The first batch is of 
only 7 users. For this batch of users, we asked them to fill 
in a questionnaire before trying IAA. The questionnaire has 
two sections. The first section will ask a user to provide 
18 web pages that he usually visits. The second section is 
to assign weights to the 5 advertisement categories to ver- 
ify the advertisements that the users are usually interested. 
After receiving the survey forms, we divided the advertise- 
ments into two groups. One was the targeted advertisement 
categories including computer hardware, software and In- 
ternet. The second one was the uninterested advertisement 
categories including finance and audio. 

For the second batch, we obtain the access logs from 
their web masters with the permissions of the users. The 
OPM algorithm is applied to discover the access patterns 
and hence the web pages for those users visit frequently. As 
in the first batch, we also ask the users to rank the advertise- 
ment category manually. 

For all users in our experiments, user profiles are con- 
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Figure 8. Generated user profiles. 



structed according to the content of the web pages found. 
As there are five advertisement categories in our experi- 
ments, only the profiles belonged to the five categories are 
listed out. The fi's and 6's of the user category profiles for 
the first 3 users are shown in Figure 8. 

5.2. Prediction Accuracy 

Each user profile is matched with the advertisement pro- 
files to determine what advertisement categories to be se- 
lected. The matching results of 4 different groups of the 
test cases are shown in Figure 9. The advertisement cate- 
gories recommended by the IAA to users are marked with 
'*\ T represents the totally overlapped, whereas ' 0' rep- 
resents close to zero. 

In order to improve the performance of calculating the 
overlapping areas, all possible discrete values of two Gaus- 
sian curves are saved in a database table. In our experi- 
ments, we compared the results of the approximations with 
the exact calculations as shown in Figure 10. We observed 
that there are discrepancies, such as for Case 7. We believe 
this is due to round-off errors as single bytes are used in the 
pre-calculated tables. Instead of asking the user to verify the 
selections, we compare the order of the five selected cate- 
gories (the 5 largest overlapping areas) with the same five 
categories ranked by the user in the questionnaire. Suppose 
the five selected categories are C\ , . . . , Cs and the ranked 
categories are /2 5 . We measure the prediction cor- 

rectness as 

When P c is 1, the prediction is perfect. When it is 0, the 
prediction is completely inaccurate. 
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Figure 9. Partial results of profile overlapping. 



5.3. IAA Prototype Performance 

From our evaluations, we observed that about 1 6% of our 
cases have discrepancies between exact and approximated 
calculations. Therefore, we found that the precision of the 
pre-calculation overlapping area technique is insufficient. If 
the precision is improved, the prototype can produce more 
accurate predictions. To do that, the discrete values of the 
Gaussian curve should use double bytes instead of single 
byte. Figure 10 shows the prediction performance of the 
IAA prototype using the exact results. Overall, the predic- 
tions are doing quite well except for the secondary students. 
We believe that secondary students have a wider scope in- 
terest and are not as focused as computing professionals or 
to-be professionals. 

Here, we would like to discuss some sample cases which 
can be generalized for others. For example, we found that 
the results of Case 1 and Case 3 are very close to the user's 
expectation. The recommended category for Case 1 is soft- 
ware which matches with the most interested category of the 
user. In addition, the results of computer, software and In- 
ternet categories have large overlapping areas. At the same 
time, IAA can also filter out the uninterested categories such 
as the finance and Hi-Fi. 

Another example is Case 2 where we found that IAA 
recommends the Software category with approximated re- 
sult but the correct category for Case 2 with the exact re- 
sult. In the case, we found that Hi-Fi category has a relative 
large overlapping area. It is because the number of key- 
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Figure 10. Average P c of users evaluating the 
IAA prototype. 



words in Hi-Fi was too small and the advertisement profile 
was close to zero. Under this circumstance, a misled re- 
sult is produced. In our design, we used the advertisement 
profile as a control to prevent from falling in the dislike sit- 
uation. Therefore, the overlapping area of Hi-Fi can be re- 
duced if the profile has stronger category characteristics, far 
from zero. 

6. Conclusion 

The rapid growth of Internet users attracts advertisers to 
post their advertisements in Internet. The probabilistic se- 
lection algorithm was not satisfactory [4]; while other ad- 
vertising agents are unable to guarantee the quality due to 
insufficient and unstable user information. We took this op- 
portunity to develop a new advertising agent based on user 
information. By using the keyword knowledge base and 
Gaussian curve transformation, we can determine users* in- 
terests when comparing the advertisement and user profiles 
in an effective and efficient mechanism. 

A prototype of the Intelligent Advertising Agent has 
been developed with Java and Oracle. From our evaluations, 
we observed that about 80% of the test cases are successful 
in making predictions which generated the most favorable 
category that the users are interested. 
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Abstract 

Constraints that limit accurate targeting of advertising in 
traditional media may not hold in cyberspace. TVs paper 
presents a model for effectively and efficiently targeting 
hypermedia-based banner advertisements in an online 
information service. The model takes advantage of 
information technology to micro-target banner 
advertisements based on individual characteristics of users. 
A simple version of the model, which has the virtue of ease 
of development, is presented Enhancements are also 
proposed These require more effort to develop, but may 
lead to even more precise targeting of advertisements. 
Implementation of this framework may benefit both online 
advertisers and online consumers. 

1. Introduction 

Cyberspace is a rapidly growing new medium for 
commerce. To date, a great deal of industry attention has 
focused on electronic transactions over the Internet 
Although rapid growth is predicted over the next few years 
[10, 17, 21], actual sales thus Car have been only moderate: 
users appear to regard the Internet primarily as a source of 
product information— when it comes time to pay, they prefer 
to buy offline by more conventional means [12, 14]. 

Responding to consumers' desire for information, 
businesses in large numbers have developed sites on the 
Worldwide Web(WWW or Web). Most commercial Web 
sites describe the firm and its products and/or services, and 
many offer opportunities for visitors to the Web site to 
provide feedback and ask for specific information. As well, 
some Web sites collect information from visitors in order to 
improve future offerings. Some sites also support ordering 
and payment The interactive potential of Web sites is 
particularly exciting, as it facilitates relationship marketing 
and customer support, eliminating the obstacles of 
geography and time [14, 22]. Not surprisingly, then, 
industry and scholarly research has recently focused on 
making Web sites more appealing and useful to visitors [1 3], 
How vet, a Web site can only be effective if current and 
prospective customers visit it Attracting this audience is 
currently a major challenge. 



In this paper, we address the challenge of attracting a 
defined target audience to a Web site \m banner advertising. 
We propose a framework for effectively targeting banner 
advertising in an electronic marketplace in a manner that 
benefits both advertisers and consumers. It allows 
advertisers to reach consumers who are more likely to be 
interested in the products and/or services offered by the 
company, and exposes consumers to information about 
products and services that they are likely to be interested in 
purchasing. Although the framework is discussed in terms 
of the Internet, we believe it will be relevant to whatever 
form the "information superhighway" eventually assumes. 
The framework takes advantage of the capabilities afforded 
by information technology for collecting and processing data 
about users. The next section examines trends in the 
electronic marketplace. Subsequently, the current state of 
advertising in this medium is discussed. Thereafter, a 
framework for targeting banner advertising, supported by 
appropriate information technologies, is proposed Finally, 
opportunities for further research are discussed 

2. Marketing and Advertising in an Evolving 
Electronic Marketplace 

The Internet began in the early 1970s as a US 
government research project designed primarily for the 
needs of the military. It expanded in the 1980s to serve the 
international academic and research communities [19, 23]. 
In the 1990s, businesses began to appear on the Internet 
Although accurate estimates are obsolete as soon as they are 
made, it is clear that today tens of millions of people have 
access to the Internet [16] through over 100,000 computer 
networks in 150 countries— and the numbers continue to 
increase [14]. Two types of developments are particularly 
noteworthy with regard to this growth. 

First, a large and ever expanding number of affluent, 
educated consumers are using the Internet [11]. This 
concentration of very desirable consumers has led to a surge 
in commercial interest. Prior to 1990, nodes on the Internet 
were predominantly academic institutions. In 1990, about 
1,000 businesses had Internet connections. By June 1995, 
over 21,000 businesses were online, and the growth in 
commercial connectivity shows no sign of slowing [8]. 

Second, the emergence of the hypermedia-based WWW, 
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together with ppint-and<lick multimedia interfaces soch as 
Netscape, have greatly increased usability of the Internet for 
persons without extensive computer training. The 
development of "applet" technology, \such as Java,, which 
allows programs to run on a variety of platforms, increases 
the transparency of various Internet services. In other 
words, as technology continues to evolve, it is no longer an 
obstacle to, but an enabler of, electronic commerce. 

In this, envrroriment, companies are seeking ways to use 
the Internet effectively [1, 3, 13, 22], One active area in 
electronic commerce involves using the internet as a 
medium to communicate persuasive product and service 
information via advertisements. These take various forms, 
the most common of which are corporate Web sites and 
banner advertising. We define a banner advertisement as: 

• paid communication (via text, graphics, videb and/or 
audio) of information about an organization and/dr its 
products and services 

• by an identified sponsor 

• embedded within, and visually distinct from, 
information provided by an online, service 

• with hypermedia links to the sponsor's Web site. 

We distinguish banner advertising from simple hypermedia 
links (paid or not) to commercial Web sites: banner 
advertising conveys a message even if the user does not 
follow the link; simple hnks can only convey a message if 
the user follows the link. Banner advertisements are also 
distinct from what [14] refer to as "flat ads," single page 
advertisements that do not contain hypermedia links. In this 
paper, we restrict our discussion to banner advertising that 
appears in the course of users' browsing and searching 
activities on information services, such as Yahoo! 
( h 1 1 p : / / w ww . y a h o o . com) and Excite 
(http://www.excite.com), that provide an entry point to 
Internet resources. Appendix 1 shows a banner 
advertisements by the Saturn automobile company. 

Scant attention has been paid to banner advertising by 
researchers. This may be because banners seem relatively 
insignificant, especially when compared with the interactive 
richness of Web sites. Technical specifications for banner 
advertisements severely limit creative options and preclude 
any consumer-firm interaction beyond the consumer's 
selection of the hypermedia link to the associated Web site 
(Excite, for instance, specifies that "all banners are 468x60 
pixels, gif format only v maximum file size is 7k" [9]). 
Banner advertisements are, however, very important and 
interesting when viewed as part of a system that converts 
browsers and searchers into Web site visitors and, 
ultimately, customers. In their model of this conversion 
process, Berthon, Pitt and Watson [3] identify a sequence of 
tasks. First, users must be made aware of the Web site, then 
they must be attracted to and locate the site. Once users 
have found the Web site, the task is to turn that hit into a 
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precisely targeting advertisements based on characteristics 
and behavior of individual users of information services. 
Moreover, such targeting can be more precise than the 
targeting possible in traditional media. For example, visitors 
to a Travel" page on an information service may be good 
targets for an advertisement for discount airfares, as would 
, readers of the Travel section of a newspaper. But the fact 
that the online visitors have made a series of decisions and 
taken a series of actions (Le., selecting only a subset of 
highlighted links within a hierarchical menu of categories) 
to reach the Travel page* rather than some other page (e.g., 
the Home Decorating page) suggests they may have a 
greater interest in travel than, say, readers who 
unintentionally come upon the Travel section of a 
newspaper and decide to read it Since these exposures are 
more likely to be target audience members, attraction 
effectiveness can be improved Targeting individual users 
strategy should also lead to fewer wasted exposures, since 
the advertisement would not be shown to users who have not 
reached the Travel page, thereby improving attraction 
effectiveness. (See Appendix 2 for a similar example.) 

At present, targeting of banner advertising does not 
always occur. For example, Appendix 3 shows an 
advertisement for Honda that appeared when Organic 
Gardening was selected from a hierarchical menu of 
categories. People interested in organic gardening may not 
be the best prospects for automobiles, as they are likely to be 
more environmentally sensitive than the general population 
and may feel that cars unnecessarily harm the environment 

Nevertheless, online information services do currently 
provide some targeting capability. As of August 1996, both 
Yahoo! [24] and Excite [9] offered advertisers three options: 
genera] rotation, geographic or content targeting, and 
keyword-based targeting. With "general rotation," banner 
advertisements rotate randomly through user searches and 
browsing on the site. The Honda advertisement that 
appeared on the Organic Gardening page in Appendix 3 was 
probably in general rotation. Restricted rotations permit 
advertisers to purchase space in specified content areas or by 
geographic region. For example, financial institutions can 
limit the exposure of their banner advertisements to users 
searching or browsing Business categories, and Canadian 
advertisers can choose to have their banner advertisements 
shown only to users who are searching or browsing in the 
Yahoo! Canada site. These two options are analogous to the 
targeting offered by traditional media such as newspapers, 
magazines, television, and radio [4]. 

The third option, keyword-based targeti rig, makes greater 
use of the targeting potential of information services. A 
company can boy keywords so that whenever a user enters 
one of those keywords during a search, s/he will be exposed 
to the company's banner advertisement This ensures that 
the banner advertisement is presented only to people with a 



demonstrated interest in the area. For instance, a marketer 
of golf equipment might buy the keyword "golf." Every 
time a user enters "golf in a search , a banner advertisement 
for the equipment would appear. This is analogous to the 
more precise targeting provided by magazines. 

While these are useful strategies, they fail to take full 
advantage of the targeting potential of banner advertising. 
Current technology provides the capability to develop 
sophisticated and detailed profiles of individual users of 
information services based on individual characteristics and 
past patterns of behavior in using the information service. 
The next section proposes and describes informally two 
versions of a model for targeting banner advertising by using 
the information technology on which an online information 
service is built 

3. A Model for Targeted Advertising 

In traditional media, the quality of the information 
available constrains an advertiser's ability to target 
advertising effectively and efficiently. For example, many 
media buying decisions are based on data provided by 
research bureaus such as the Audit Bureau of Circulations 
(ABC), Business Publication Audit of Circulation (BPA), 
Arbitron, and AC Nielsen, which collect data on the 
demographics and media habits of consumers, and 
sometimes on product usage and brands [4]. These survey 
data are cross-tabulated to develop a profile of the audience 
of each media vehicle. The audience profile is then 
compared to the target audience profile identified by the 
advertiser to determine where there is a good match. For 
instance, an antomobile manufacturer might identify the 
target audience for an advertisement for a particular model 
of car as middle-income females, 18 to 34, with busy 
lifestyles. Based on research bureau data, as well as the 
experience and judgement of the media planner, media 
vehicles with good reach in that demographic group would 
be chosen. Realistically, though, this type of targeting is 
usually very approximate. For instance, no matter how well 
the media vehicle audience profile matches the target 
audience profile, it is likely that only a portion of the 
audience would be in the market for a new car. 

Online banner advertising may be able to overcome this 
problem It is possible to target users very precisely because 
data can remain associated with individuals, so advertisers 
can select exactly the users to whom they wish their 
advertising to be exposed. It may be possible, for example, 
to identify which users will be in the market for a new car in 
a particular year. The remainder of this section describes 
two versions of a model for targeting banner advertising by 
taking advantage of the technological capabilities of the 
online environment The model is designed to be 
appropriate for use by information services which sell 
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^advertising space: . : v 

."3.1. Basic version 

The basic version of the model requires that users be 
assigned unique identifiers (e.g., user accounts) when they 
first connea to the iriformation service: Subsequently, they 
. provide these, identifiers each time they connect. Users also 
complete an online questionnaire the first time they use the 
information - service. : . (Incentives to complete the 
questionnaire may be provided by informing users that the 
information will be used to filter out advertising for products 
in which they are likely not to be interested) The 
questionnaire allows data to be collected on several 
dimensions, including: (1) demographic attributes such as 
geographic location, income, family lifecycle stage, 
occupation, and sex; (2) psychographic attributes such as 
travel patterns and hobbies; and (3) product and brand usage 
attributes. This element of the basic model permits a banner 
advertisement to be directed to users (and only those users) 
who fit certain criteria, assuming : data were collected on 
relevant attributes. For instance, a banner advertisement for 
baby strollers could reach parents of children under five 
years 61d--and only individuals in that group.. 

; In contrast, research bureau data uses demographic 
correlates (e.g., males and females, 18 to 34) to identify 
media vehicles that attract a relatively large proportion of 
the people in the identified demographic group [4]^ The 
media vehicles thus chosen may rrdss 
group (e.g., older parents) and reach consumers not in the 
target group (e.g., people who are between 18 and 34 but do 
not have young children). Even audience data based on 
crbss^tabularions, while ttey supply information on more 
variables, still cannot isolate individuals who are in the 
: target audience. (For example, research bureau data may 
allow an advertiser to identify a magazine whose audience 
includes a large, number of people between 18 and 34 who 
have young children, but there will still be some readers who 
are not in the target market) 

The second element of the basic model involves eliciting 
: the target audience profile from advertisers. An advertiser 
can specify a target audience using any number of attributes 
about which data have been collected. These can be 
expressed conjunctively and/or disjunctively. Forexample, 
a specification may indicate that an advertisement is to be 
presented to all users who (1 j have household incomes over 
$50*000, and (2) either work in a job that involves travel at 
least four times per year or have travelled on vacation in at 
least four of the past five years. 

In this version of our: model, the/ questionnaire 
determines the data collected about each user;: The content 
of the questionnaire will yary depending on the nature of the 
information service, expected users, and expected 
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questionnaire strategy is that it is subject to two potential 
types of bias. First, the questionnaire designer will want to 
identify as many user attributes relevant to potential 
advertisers as possible. As the number of attributes 
increases, so does the length of the questionnaire, creating 
the possibility of higher mortality in completing the 
questionnaire (especially since it may be more difficult to 
induce users to complete it because they are both physically 
and psychologically remote), thereby increasing the 
potential nonresponse bias [7]. Second, the questionnaire 
method is plagued with well-known problems, such as errors 
due to inaccurate recall, telescoping, social desirability 
concerns, and cognitive biases, as well as ambiguity, 
intimidation, confusion, and incomprehensibility [2]. 

In view of these potential problems, it is appropriate to 
enhance the model so that it does not rely on user self- 
reports, can accommodate changing user characteristics and 
preferences, and is less constrained by the choice of 
questions. Fortunately, information technology may provide 
assistance in each of these areas. 

Current technology allows a considerable amount of data 
about user search activities (both deliberate search and 
browsing) to be collected unobtrusively and analyzed to 
determine patterns. (We are dealing here only with the 
capabilities of the technology, not with the ethical issues 
such capabilities raise. However, we recognize that ethical 
issues must be considered explicitly in the design of systems 
based on our model. For instance, we believe users should 
be aware that such information may be collected, and how 
it may be used, and consent to this activity before using an 
information service.) In the enhanced model, we propose 
that patterns of search and browsing behavior exhibited by 
users while using an information service determine which 
advertisements are shown to that user during current or 
future sessions. In the remainder of this section, we provide 
a general overview of this approach. 

As before, this model relies on assigning a unique 
identifier to each user for recording her/his searching and 
browsing activities while using the information service. 
Each session constitutes a "record*, consisting of data such 
as: sites visited in order; pattern of navigation through a 
hierarchical category structure (as in Yahoo!); choice of 
search terms in keyword-based searches; and reaction to 
previously exposed targeted banner advertisements (e.g., 
which linked Web sites are selected and visited by the user 
and which ones ignored). The aggregate of such records for 
each user provides a profile from which preferences can be 
implicitly generated. As a simple example, if a user has 
made several searches using keywords such as "Atlantic 
salmon" and "fly fishing", and has visited the site of the 
Angling Club Lax-a of Iceland 
(http:/Avww.ismeimtis/fyr_sto 
s/he may be targeted for a banner advertisement for a fishing 



lodge in Alaska. However, if a user has previ usly been 
exposed to the same or similar banner advertisements but 
has not visited linked Web sites when there was an 
opportunity to do so, s/he may not be shown these banner 
advertisements in future. 

This version of the model has the advantage of 
transparency. A user simply visits a service for whatever 
purpose s/he has in mind. Data are collected unobtrusively 
in the course of the visit Moreover, the data reflect actual 
user behavior, rather than attitudes, intentions, or reported 
behavior captured through a questionnaire. Hence, the 
quality of data derived from user behavior should be 
superior to that of questionnaire data, for purposes of 
targeting advertisements. 

A disadvantage of this model is the preparatory work 
involved on two fronts. First, it is not clear how to structure 
the data collected during visits so that useful information can 
easily be coded for storage and later extraction. Research is 
needed to develop useful and efficient coding mechanisms 
for storing such data as sequences of visits and search terms 
used. We expect this can be handled using conventional 
database structures such as relations (tables); however, the 
design of a relational database for this purpose is itself a 
distinct research issue. Second, the ability to store the 
required data does not necessarily mean useful information 
can be extracted from it Further research is required to 
determine the types of analyses that yield insights into user 
characteristics and preferences hidden in the data. 

The enhanced model should be used in conjunction with 
the basic model. A questionnaire may be very effective for 
identifying various demographic data relevant to advertisers 
but impossible to ascertain simply from users* online search 
and browsing behavior. However, since demographic data 
has limitations for effectively targeting consumers of most 
products, the enhanced model of data collection may yield 
complementary data on preferences from patterns of online 
search and browsing behavior. 

The next section describes an implementation 
architecture for the basic version of the model. Extensions 
that support the enhanced version of the model remain as 
future research. 

4. An Implementation Architecture 

The architecture required to implement the basic version 
of the model consists of two parts: data structure to 
represent user profiles and target audience profiles, and an 
algorithm to select banner advertisements to display to a 
user. This section describes these components. 
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4.1. Data Structure 

To target banner advertisements, two types of profiles 
are needed: profiles describing users of the information 
servide; and profiles describing the target audience for 
advertisements, as defined by advertisers. Each profile can 
be modeled as a set of attributes. 

We assume there is a finite "universe" of attributes, A 
- .<ai,;..,2fe>, thafmay potentially characterize users or target 
audience members. 

4.1.1. User Profile. Each user, u h of the service can be 
described by a record consisting of values of the universe of 
attributes, ki = <ai(Ui),...,atf(Ui)>, where tg(Ui) (n=l,...,N) 
denotes the value of attribute an for user'uj. This may be 
implemented in a relational database in which a table is 
defined whose primary key is a user identifier, and 
remairung attributes ace those in A, Each row in the table 
contains the profile of one user. (A more elaborate data 
structure is needed to support the enhanced model, since 
data must also be kept about the pattern of behavior of a 
user over one or more sessions.) AH attributes need not be 
applicable or relevant to a particular user; hence, null values 
are permitted. 

A simple example serves to illustrate this structure. 
Consider a universe consisting of three attributes: age, 
income, and number of dependents. Suppose there are two 
users of a service. When those users have completed a 
profile questionnaire, the resulting data may be stored in a 
relational table as: 
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To illustrate, consider;^ simple example in which there ■ 
are two advertisernents^ each -with a different target audience; 
profile, designated Ti and Tj. The id^larget profUe for Ti 
is users aged 35 within<x>rnes of $50$00 (no restrictions on 
number of dependents); while that for T 2 is users agedo25 ; 
with incomes of $25,000 and no dependents, lliese profiles: 
are shown in the following relational table. 



TARGET -.. ... 

ad_id age income 

Ti v; 35 5O0Q®% 

T 3 ; 25 : 25QQ$h 



dependents : 



USER 

user_id age 

Ui 26 

u 2 45 



income 

34000 
54000: 



dependents 

0 

2' 



4.12. Target Audience Profile. A target audience profile 
is associated with each banner advertisement A profile may 
be expressed as: 

(1) A characterization of an "ideal" target audience 
member. 

Such an ideal can be described by a record consisting of 
values of the universe of attributes, TV = <tw. .. ,t N >i where 
t B (n=l,...,N) is a specific value of attribute a„. Some 
values may be null, indicating that any values of those 
attributes are permitted for the ideal; and/or a 

(2) A characterization of the "acceptable" target audience. 
Generally, an advertiser is interested in reaching those 
within specified rang;es ;of the attributes of interest 
Given N attributes of interest, acceptability can be 
thought of as a region in N-dimensional space. This 
region can be defined by specifying ranges of acceptable 
values for various attributes in the universe. 
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RAKQE 

ad_id attribut lower upper 

Ti age 20 50 

Ti income 40000 60000 

T 2 age 20 30 

T 3 income 20000 30000 

In this case, the acceptable profile can be depicted as a 
region in two-dimensional space. Figure 2 shows the profile 
for TV 

It is possible that both ideal and acceptable profiles could 
be generated by the same advertiser. By overlaying Figures 
1 and 2, shown in Figure 3, we note that the ideal point need 
not lie at the geometric center of the acceptable region. 
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To handle measures of "distance" from an ideal, ranges of 
values on relevant attributes can be replaced with advertiser- 
specified information about the acceptable distributions of 
values over attributes. For instance, an advertiser may 
specify a mean (ideal) and standard deviation for an attribute 
if "acceptability" is normally distributed about a central 
value. Other measures of central tendency and dispersion 
may be appropriate for attributes in which the range of 
acceptability is quantified differently. The data structure of 
the RANGE table can be modified to accommodate this 
additional complexity. 

42. Selecting Advertisements for Users 

The primary challenge in effectively and efficiently 
targeting banner advertising is matching user profiles with 
target audience profiles. Figure 4 uses a data flow diagram 
to depict the matching process described below. 

When a registered user visits an information service, 
his/her profile is retrieved. This profile is then compared 
with the target audience profiles of banner advertisements 
currently being run by the information service. Each target 
audience profile is associated with a banner advertisement. 
For each target audience profile, if there is no match with 
the user profile, the associated advertisement is dropped 
from further consideration. 

After the comparisons are completed, a set of matched 
target audience profiles from a variety of advertisers 
remains. If this set is small, it may be feasible to show all 
the associated banner advertisements to the user during the 
session. In general, though, it will be necessary to select 
some subset of advertisements from the matched set to 
display to the user. We envision that the advertisers whose 
advertisements are in the matched set will compete for the 
opportunity to have their banner advertisements displayed 
to the user. 

The concept of acceptable regions in target audience 
profiles provides a basis for competition. Profiles 
accommodate the possibility that some users within the 
region of acceptability may be more desirable to an 
advertiser than others. Hence, a distance metric capturing 
the relative desirability of a user with respect to an ideal 
profile is possible. It is not the purpose of this paper to 
propose or evaluate metrics. However, recognizing a notion 
of distance allows the possibility for advertisers to "bid" for 
theopportunity todisplay anadvertiseirtenttoauser. Such 
bids would be determined by the advertiser, based on 
variables such as the user profile (to determine the distance 
from the ideal target audience profile) and advertising 
budget. It may be feasible to automate this by having 
software agents associated with each advertisement that 
would calculate the distance measure for the user and 
formulate a bid based on this, in addition to other 
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Figure 4 

Selecting Advertisements for Users 




"Advertisement 



information such as whether the user had seen this 
advertisement, or other advertisements for the same or 
similar products, in previous sessions (information which 
could be carried as part of the user profile). 

When bids are received, they can beTrankecL The banner 
advertisement corresponding to (he winning bid is displayed 
to the user. Other advertisements may be displayed 
according to their ranking if there is ah opportunity to 
display additional advertisements (e.g., if the user engages 
in several search or browse activities during a session). 

This architecture provides guidance for implementing 
the basic version of the model. We present next a simple 
example showing how the architecture operates. 

4.3. Example 

Consider the relational database tables USER, 
TARGET, arid RANGE presented earlier; Suppose first that 
the user with profile Ui connects to the information service. 
This user's profile, consisting of the database record <u h 26, 
34000, 0> is retrieved from the USER table. Next, target 
audience profiles T, and T 2 are retrieved from the TARGET 
table: These identifiers determine the attributes whose 
profile ranges have to be selected from the RANGE table. 
Next, the age range for T b namely (2030), is retrieved from 
R ANGE. Since the age value of Ui is 2$, there is a match on 



this criterion. So, the salary range for Ti , (40000,60000) is 
retrieved Since the user Ui^^s not match this criterion of 
the target audience proilie (sialary; is i;34006), the 
advertisement <x>mspon^gfio;ihc profile Ti will not be 
shown to Ui. Applymg; te|s^i^ target 
audience profile %> \x x ^0^mt be exposed^ to the 
advertisement corresponding to T 2 since the! income v of 
(34000) is greater than the upper bound of 3Gf00d^cifieci 
in the target audience profile T> Tmis theuser with profile 
Ui would not be exposed to any banner advertisements when 
s/he used the information service. ;-C;Thl&---is.eMcienV since 
showing either banner ^ advertisemeBt to thetiser with profile 
Ui would entail a cost arid constitute a wasted exposure. 

Suppose now that the user wim profile u^ connects to the 
^formation service. T^ 

is first retrieved from ih6 M^S^o^ ^t. the : target 

audience profiles Ti' and T 2 are^ the 

matching algorithm, a iri^ch*^|l ; be found between u 2 and 

the profile T 2 ; (Note mat &e^^ our 

example do not specify restrictions 

dependents; hence, any values are ^nm this 

attribute.) However, there-is no match ;ber^eeh u 2 ;and Ti 

since the income of u 2 (540Q0) is beypnd the upper bound 

of income for T 2 (30000); Hence, ^ 

will be exposed to the advertisement corresponding to the 

target audience profile Ti \ ; c ^ 
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This example does not show the full scope of the model , 
since there is no case where there are two or more target 
audience profiles that match a particular user profile. To 
illustrate this, consider an additional target audience profile 
having only the condition that user income must be at least 
50000. This profile would require adding the following 
record to the RANGE table: <P 3 , income, 50000, * >. Now 
if the user with profile 03 connects to the information 
service, a match with both T 2 and T 3 will be found In this 
case, the advertisers (or software agents) responsible for T 3 
and Tj will be contacted and provided with the profile u?. 
Each advertiser (agent) will prepare a bid indicating how 
much it is willing to pay to have the banner advertisement 
cxHrespooding to its profile exposed to the user with profile 
u a . These bids are compiled and returned to'tbe information 
service, where they are ranked If we alio w that the user 
will be shown only one advertisement, the one which placed 
the highest bid will be chosen. 

In summary, this model makes use of rich, multiattribute 
data at the individual user level in tetermining whether each 
one will be exposed to a particular banner advertisement 
This leads to more effective and efficient targeting than is 
possible using strategies such as general rotation, which 
does not use data at the individual level, and restricted 
rotation or keyword search, which rely on only a single data 
item about an individual in detennining which banner 
advertisements) to present to the user. It is also more 
effective and efficient than targeting in the traditional media, 
which does not use any data at the individual level 

5. Future Research 

This paper has presented a framework for leveraging 
information technology to target online banner advertising 
more effectively to benefit both users (who would be 
exposed only to advertising that is very probably of interest 
to them) and advertisers (whose advertisements would reach 
only those users who fit the target audience profile). This 
framework is, however, merely a starting point Additional 
research on several fronts is needed before its potential can 
be realized Several specific research concerns have already 
been noted In addition, there are more general issues. 

First, a system supporting the basic version of the model, 
based on the implementation architecture presented in this 
paper, should be implemented 

Second an implementation supporting the enhanced 
version of the model is needed This will require research to 
develop a more sophisticated database structure that can 
preserve users* searching and browsing behavior over time. 
In addition, techniques for detecting patterns ofbehavior are 
needed 

Third, both theoretical and empirical research is needed 
to explore agent bidding in the context of the framework 



proposed in this paper. 

Fourth, empirical work needs to be done to evaluate the 
relative effectiveness and efficiency of this framework. A 
priority should be to compare the (1) the basic version of the 
model, (2) the enhanced version of the model, and (3) 
existing approaches to targeting advertisements. For 
instance, it would be interesting to test whether placing a 
banner advertisement on a relevant page (e.g., an 
advertisement for a new movie on the Entertainment page of 
an information service) would be more or less effective than 
directing the same advertisement to individual users selected 
on the basis of their answers to a questionnaire (Le., the 
simple version of the model) or their search and browsing 
behavior (i.e., the enhanced version of the model). 

Finally, the utility of this framework in other online 
contexts should be investigated For instance, this approach 
could be used in developing Web sites that are more useful 
to visitors. Visitors with different profiles could 
automatically be shown different pages more likely to be of 
interest to them, eliminating the need for them to search the 
Web site for the information they desire. 

6. Conclusion 

Cyberspace is a new medium for advertising. In 1994, 
Edwin Artzt, chairman of Procter & Gamble, the largest 
advertiser in the United States, warned advertising agencies 
to "get their interactive act together" [6, p. 75]. As the 
advertisements in Appendices 1 and 3 show, even major 
advertisers and their agencies may not be taking full 
advantage of the opportunity to target their online banner 
advertising. The information technology that makes the 
WWW possible also allows the unobtrusive collection of 
detailed information about user interests based on their 
online searching and browsing. Advertisers should not 
assume that the same constraints that make media planning 
in traditional media a very inexact science also apply to 
online advertising. 
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